Skip to content

Update FRASER for test build#272

Merged
Johnnyassaf merged 90 commits intomainfrom
update-fraser-version
Feb 18, 2026
Merged

Update FRASER for test build#272
Johnnyassaf merged 90 commits intomainfrom
update-fraser-version

Conversation

@MattWellie
Copy link
Contributor

No description provided.

@github-actions
Copy link

github-actions bot commented Jan 6, 2026

🐳 Expected Tags

Some Dockerfiles have been modified in this PR. New tags are expected to be deployed.
The table below shows what the tags are expected to be on merging this PR.

Name Tag
fraser 2.4.6-1

Copy link
Contributor

@Johnnyassaf Johnnyassaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved by specifically unpinning R from 4.3.1 to 4.4.3, if we want to keep R low, then we need to keep fraser at 1.14.1, which would not be ideal

r-base=4.4.3 \
r-reshape2=1.4.5 \
r-tidyverse=2.0.0 \
bioconductor-fraser=2.4.6 \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bioconductor-fraser=2.4.6 \
bioconductor-fraser=${VERSION} \

this should be using the tag directly

@Johnnyassaf Johnnyassaf merged commit 7f0ed17 into main Feb 18, 2026
7 checks passed
@Johnnyassaf Johnnyassaf deleted the update-fraser-version branch February 18, 2026 03:05
Johnnyassaf added a commit to populationgenomics/cpg-flow-rdrnaseq that referenced this pull request Feb 18, 2026
## Refactor FRASER pipeline with separated R scripts and improved job management

### Summary

Major refactoring of the FRASER aberrant splicing analysis pipeline, including version bump from 0.2.5 to 1.0.0.

### Changes

#### FRASER Pipeline Refactoring (`src/rdrnaseq/jobs/fraser.py`)
- **Separated R scripts**: Broke monolithic FRASER R code into modular scripts that are baked into and retrieved from a Fraser Image : populationgenomics/images#272
  - `fraser_init.R` - Initialize FRASER object
  - `fraser_count_split.R` - Count split reads per sample
  - `fraser_merge_split.R` - Merge split read counts across samples
  - `fraser_count_non_split.R` - Count non-split reads per sample
  - `fraser_merge_non_split.R` - Merge non-split read counts
  - `fraser_join_counts.R` - Join all count matrices
  - `fraser_analysis.R` - Run FRASER analysis and generate results

- **New `get_fraser_job()` helper function**: Centralized job creation with consistent resource allocation based on `MachineType` (HIGHMEM/STANDARD)

- **Configurable job resources**: Added config-driven storage and CPU allocation:
  - `cohort_job_resources`: For multi-sample jobs (base_storage_gb, per_bam_storage, ncpu)
  - `sample_job_resources`: For single-sample jobs

- **Improved job dependency management**: Fixed job dependency chains to ensure proper execution order across pipeline stages

- **File caching fixes**: Implemented proper output existence checking to skip completed steps

- **Storage optimization**: Intermediate results now saved to temporary GCS storage, with significant results saved permanently

#### Stage Updates (`src/rdrnaseq/stages.py`)
- Updated Fraser stage with new analysis keys: `Rds_data`, `seqr_data`, `sig_results`
- Improved CRAM→BAM conversion tracking to avoid duplicate conversions across stages
- Added `samples_needing_bams` tracking dict for cross-stage job dependencies

#### Configuration (`src/rdrnaseq/config_template.toml`)
- Added `cohort_job_resources` section for cohort-level job configuration
- Added `sample_job_resources` section for sample-level job configuration

#### Cleanup
- Removed deprecated `picard.py` module (465 lines deleted)
- Updated README with current pipeline usage
- Updated Docker image reference
- Version bump: 0.2.5 → 1.0.0

Co-authored-by: Matt Welland <mattwellie@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants